Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement batched serial tbsv #1

Draft
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

yasahi-hpc
Copy link
Member

This is a draft PR to discuss the implementation strategy of tbsv function.

If I understand correctly, a minimal set for each kernel includes

  1. KokkosBatched_Tbsv_Serial_Impl.hpp: Internal interfaces
  2. KokkosBatched_Tbsv_Serial_Internal.hpp: Implementation details
  3. KokkosBatched_Tbsv_Decl.hpp: APIs
  4. Test_Batched_SerialTbsv.hpp: Unit tests for that

Detailed description

It solves the equation Ax = b.
Here, the matrix has the following shape.

  • A: (batch_count, lda, n)
    n by n unit or non-unit, upper or lower triangular band matrix with (k+1) diagonals.
  • x: (batch_count, n)
    Before entry, the incremented array x must contain the n element right-hand side vector b.

Example of a single batch of matrix A with n = 10 and k = 3.

upper
1 1 1 1 0 0 0 0 0 0 
0 1 1 1 1 0 0 0 0 0 
0 0 1 1 1 1 0 0 0 0 
0 0 0 1 1 1 1 0 0 0 
0 0 0 0 1 1 1 1 0 0 
0 0 0 0 0 1 1 1 1 0 
0 0 0 0 0 0 1 1 1 1 
0 0 0 0 0 0 0 1 1 1 
0 0 0 0 0 0 0 0 1 1 
0 0 0 0 0 0 0 0 0 1 

lower
1 0 0 0 0 0 0 0 0 0 
1 1 0 0 0 0 0 0 0 0 
1 1 1 0 0 0 0 0 0 0 
1 1 1 1 0 0 0 0 0 0 
0 1 1 1 1 0 0 0 0 0 
0 0 1 1 1 1 0 0 0 0 
0 0 0 1 1 1 1 0 0 0 
0 0 0 0 1 1 1 1 0 0 
0 0 0 0 0 1 1 1 1 0 
0 0 0 0 0 0 1 1 1 1 

upper_banded
0 0 0 1 1 1 1 1 1 1 
0 0 1 1 1 1 1 1 1 1 
0 1 1 1 1 1 1 1 1 1 
1 1 1 1 1 1 1 1 1 1 

lower_banded
1 1 1 1 1 1 1 1 1 1 
1 1 1 1 1 1 1 1 1 0 
1 1 1 1 1 1 1 1 0 0 
1 1 1 1 1 1 1 0 0 0 

Parallelization would be made in the following manner. This is efficient only when
A is given in LayoutLeft for GPUs and LayoutRight for CPUs (parallelized over batch direction).

Kokkos::parallel_for('tbsv', 
    Kokkos::RangePolicy<execution_space, ParamTagType> policy(0, n),
    [=](const ParamTagType &, const int k) {
        auto aa = Kokkos::subview(_a, k, Kokkos::ALL(), Kokkos::ALL());
        auto bb = Kokkos::subview(_b, k, Kokkos::ALL());

        KokkosBatched::SerialTbsv<
            typename ParamTagType::uplo, typename ParamTagType::trans,
            typename ParamTagType::diag, AlgoTagType>::invoke(aa, bb, k, incx);
    });

Tests

Firstly, construct an Upper or Lower banded diagonal matrix A. Then, convert it to the banded storage Ab. Solving Ax = b with trsv and (Ab)x = b with tbsv. tbsv result is compared with trsv result.

@yasahi-hpc yasahi-hpc marked this pull request as draft April 18, 2024 12:34
@yasahi-hpc
Copy link
Member Author

For the moment, a parameter incx is ignored for simplicity

@yasahi-hpc
Copy link
Member Author

yasahi-hpc commented Apr 25, 2024

TODO tasks

  1. Development
  • Public interface should be KokkosBatched_Tbsv.hpp not KokkosBatched_Tbsv_Decl.hpp. This is a recent trend.
  • Check the matrix format. It should be a rank-2 matrix (already confirmed) with the appropriate storage scheme as described in banded storage format.
  • Enable incx parameter
  1. Tests
  • Add a simple and small analytical test, i.e. chose A and b and compute by hand x, then check that it matches with your routine.
  • Do some corner cases, what happens when you have an empty matrix? Zero upper/lower diagonals, zeros on the diagonal, etc...
  • Generate random matrices for A and x, compute b with a matvec and then check that solving Ax=b returns the original x vector. (This is partially done. Tbsv is compared with Trsv)
  • Ideally if you have any assertions/runtime checks that can be done, you want to check that they are catching what they are supposed to catch
  • Finally you want the test to cover all backends for which the algorithms is supposed to run. (Serial, OpenMP, Thread, CUDA, HIP can be tested)

@yasahi-hpc
Copy link
Member Author

Most of the modifications are made.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant